Policy oscillation is overshooting
نویسنده
چکیده
A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting, within the context of non-optimistic policy iteration, a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. Based on empirical findings, we offer a hypothesis that might explain the inferior performance levels and the associated policy degradation phenomenon, and which would partially support the suggested connection. Finally, we report scores in the Tetris problem that improve on existing dynamic programming based results by an order of magnitude.
منابع مشابه
Monetary policy and exchange rate overshooting in Iran: A Vector Errors Correction (VEC) approach
Assumption of exchange rate overshooting has significant position in international macroeconomic discussion. This phenomenon is one of the abnormal behaviors of exchange rate that happen in short run. Dornbusch (1976) shows that because speed of equilibrium prices is slow relative to asset markets and commodity prices are sticky in the short run, However, over time, commodity prices will rise a...
متن کاملSticky Prices and Alternative Monetary Feedback Rules: How Robust Is the Overshooting Phenomenon?
The present paper incorporates a mechanism of rules-based central-bank interventions into a Dornbusch-type framework. We show that the implied reactions of exchange rates and interest rate differentials in response to a monetary shock depend crucially on the particular monetary policy feedback rule. The Dornbusch case of positively correlated and overshooting nominal and real exchange rates as ...
متن کاملA thermal oscillation under a restorative forcing
The authors report an interdecadal oscillation in a windand thermally-driven ocean general circulation model (OGCM). The oscillation is tantalizing in that it occursunder a relatively strong thermal damping(26.3 W m-2K-’). Examinations involving a two-dimensional OGCM, a simple thermal ‘flip-flop’ model, and a three-dimensional OGCM with and without the nonlinear effect of temperature in the st...
متن کاملAngular Momentum Transport by Gravity waves in the Solar Interior
We present self-consistent numerical simulations of the sun’s convection zone and radiative interior using a two-dimensional model of its equatorial plane. The background reference state is a one-dimensional solar structure model. Turbulent convection in the outer convection zone continually excites gravity waves which propagate throughout the stable radiative interior and deposit their angular...
متن کاملMacroeconomic Shocks and the Foreign Exchange Risk Premiums
Using a nonlinear structural Vector Autoregression model based on the general no-arbitrage condition, we examine the empirical relation between macroeconomic shocks and the foreign exchange risk premiums. We find that when the predictable excess returns from currency speculation are interpreted as time-varying risk premiums, more than 80% of its volatility can be accounted for by the same funda...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural networks : the official journal of the International Neural Network Society
دوره 52 شماره
صفحات -
تاریخ انتشار 2014